Molecular Ecology Resources — Latest Matching Preprints

1

TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces): An Optimized Algorithm for Vertebrate Taxonomic Assignments in eDNA Metabarcoding, Integrating Molecular, Taxonomic, and Ecological Criteria

Haderle, R.; Jung, G.; Riou, M.; Ung, V.; Jung, J.-L.

2026-07-09 molecular biology 10.64898/2026.06.29.735257 medRxiv

Top 0.1%

59.6%

Show abstract

Environmental DNA (eDNA) metabarcoding has become a powerful approach for large-scale biodiversity assessment, yet taxonomic assignment remains one of its most critical error-prone steps. Current bioinformatic pipelines rely on molecular similarity searches against reference databases, but assignment accuracy is constrained not only by short marker length and database incompleteness, but also by fundamental limitations, including recent species radiations, incomplete lineage sorting, introgression, NUMTs, and the imperfect correspondence between genetic variation and species boundaries. Here, we present TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces), an automated and simple protocol designed to improve taxonomic assignments in eDNA metabarcoding. Initially developed for marine vertebrates, TRIDENT may be used with any barcode and integrates three complementary sources of evidence: molecular similarity (NCBI/GenBank and BOLD), curated taxonomic information (WoRMS), and ecological plausibility derived from biogeographic occurrence data (GBIF). The workflow sequentially constructs candidate taxon lists based on sequence similarity, expands them through taxonomic hierarchies, and filters them using spatial occurrence constraints. It further identifies possible taxa lacking reference barcodes and evaluates their plausibility through CO1-based similarity if data exist in BOLD. TRIDENT has been implemented as a source-available Python tool and tested using empirical eDNA datasets from marine vertebrates as well as simulated communities. Results demonstrate that the tool produces taxonomic assignments consistent with expert manual curation while substantially reducing processing time and attention errors caused by manual processing of large datasets. By combining molecular, taxonomic, and ecological criteria within a single framework, TRIDENT improves transparency and reproducibility and provides a robust and flexible solution strengthening confidence in taxonomic identifications in eDNA-based biodiversity assessments.

2

Let the prey speak: Using PNA clamps to silence predator DNA in marine faecal diet studies

Polanowski, A. M.; Suter, L.; Deagle, B. E.; McInnes, J. C.

2026-07-08 molecular biology 10.64898/2026.06.22.733645 medRxiv

Top 0.1%

54.1%

Show abstract

DNA metabarcoding of faeces is a powerful, non-invasive method for assessing predator diets. However, when studying the diet of generalist predators, broad PCR primers are used to amplify the wide range of potential prey species and metabarcoding outputs are often dominated by sequences from the predator. While blocking primers can be used to reduce PCR amplification of predator DNA, they frequently cause partial predator suppression and unintended prey blocking. Peptide nucleic acid (PNA) clamps, offer a promising, underutilised alternative by binding strongly and selectively to predator DNA to block its PCR amplification. In this study we designed and validated a novel PNA clamp targeting the 18S rRNA gene to suppress bird and mammal predator DNA in dietary samples. We tested this clamp on tissue mixtures and faecal samples from three seabird and two seal species across temperate, subantarctic, and Antarctic regions. The PNA clamp substantially increased the proportion of prey reads recovered while maintaining consistent prey community composition across all predator species. Our results demonstrate not only the general effectiveness of PNA clamps over standard blocking primers, but also provide a powerful, broadly applicable new tool to improve the accuracy in DNA diet metabarcoding studies.

3

An evaluation of clustering and assembly strategies from Iso-Seq data in the absence of reference genomes in non-model animals

Eleftheriadi, K.; Vazquez-Valls, M.; Fernandez, R.

2026-07-08 evolutionary biology 10.1101/2025.09.18.677004 medRxiv

Top 0.1%

14.7%

Show abstract

Transcriptome assembly enables the recovery of expressed genes and isoforms, but the optimal strategy for reconstructing transcriptomes from long-read sequencing remains unresolved. In particular, establishing best practices for generating accurate gene models and selecting representative isoforms is essential for comparative genomics, as for orthology inference typically only the longest isoform per gene model is included. Here, we systematically compare clustering and de novo assembly methods using PacBio Iso-Seq data from 17 animal lineages spanning seven phyla, most of them non-model species, with the goal of investigating which methodology is more adequate to select one isoform per gene model, in the absence of specific pipelines to do so. We evaluate four approaches: isoseq cluster, CD-HIT, RNA-Bloom2 and isONform. We benchmark them with short-reads using Trinity, assessing assembly quality with BUSCO completeness, short-read mapping rates, coding sequence recovery, and longest isoform prediction. Our results show that CD-HIT clustering at high similarity thresholds ([≥]99%) yields the most complete and coding-rich long-read transcriptomes, rivaling Trinity while avoiding its high redundancy. Consensus-based methods such as isoseq cluster and isONform recover fewer single-copy orthologs (mirrored in a lower BUSCO score) and achieve lower mapping rates, while RNA-Bloom2 provide intermediate performance with reduced duplication. Together, these findings establish, to date, CD-HIT as a robust and practical strategy for transcriptome reconstruction from long-read data when genomic references are unavailable. By benchmarking de novo methods across a taxonomically broad dataset, this work defines the realistic capabilities of long-read transcriptome reconstruction in the absence of a reference genome and provides practical guidance for deriving high-quality gene models and selecting representative isoforms for orthology inference in non-model species.

4

Molecular Clock Dating of Ancient Environmental DNA Reveals Damage Beyond Deamination

Lemmon-Kishi, M.; Pipes, L.; De Sanctis, B.; Nielsen, R.

2026-07-07 bioinformatics 10.64898/2026.07.03.735781 medRxiv

Top 0.2%

10.8%

Show abstract

Ancient environmental DNA (aeDNA) from permafrost, lake, cave, and marine sediments provides a rich source of genetic data that captures broad perspectives of past biodiversity. Accurate dating is crucial for discovering ecologically relevant patterns from aeDNA, and molecular clock dating would allow for sample ages to be estimated from the recovered genetic material itself instead of the geological components. However, the fragmented and damaged nature of short-read ancient DNA (aDNA) from multiple taxonomic sources poses significant challenges and has limited this dating approach for aeDNA. Here we developed ratePlacer, a phylogeny-based method for analyzing aeDNA that can combine information from many short reads in a sample while accounting for DNA damage to provide maximum likelihood estimates of sample ages. Simulations demonstrate that ratePlacer accurately dates samples even under the fragmented, damaged conditions characteristic of aeDNA and outperforms Bayesian tip-dating approaches for taxonomically mixed samples commonly found in aeDNA. Yet age estimates from re-dating Kap Kobenhavn varied across taxa, highlighting the difficulty of molecular clock dating in aeDNA. This dating also revealed elevated G[->]T and C[->]A mismatches consistent with oxidative damage. These patterns reveal aDNA damage beyond deamination and that remains understudied, suggesting that aeDNA should be carefully evaluated in genomic and evolutionary analyses. The new dating method, ratePlacer, extends molecular clock dating of aDNA from single-specimen to pooled environmental DNA data, where traditional methods struggle.

5

Two-tower models for genomic prediction of reproductive outcomes and sex-specific fertility liabilities: simulation insights

Pappas, F.; Palaiokostas, C.; Debes, P. V.; Johnsson, M.

2026-07-09 genetics 10.64898/2026.07.03.736358 medRxiv

Top 0.2%

9.6%

Show abstract

Many biological characteristics arise by interactions between more than one biological organism or unit. Fertilization success in sexually reproducing species represents such an extended phenotype where both mates are required to be fertile for a successful outcome. Consequently, predictive models should account for the joint nature of reproductive performance while offering interpretable estimates for individual mate contributions. Recent advances in genomics and machine learning (ML) provide standardized, high-dimensional genetic information on one hand and computational tools capable of modeling complex biological systems on the other. Here, we construct and evaluate two-tower (TT) machine learning architectures for genomic prediction of binary reproductive outcomes and recovery of sex-specific fertility liabilities. Simulated datasets, generated under a range of genetic architectures, were utilized to compare multilayer perceptron (TT-MLP), convolutional neural network (TT-CNN), and L1-regularized linear (TT-LASSO) two-tower models. Simulation scenarios varied sex-specific heritabilities, genetic correlations, infertility prevalence, mating structure, and sex-specific infertility rates. Models were evaluated with regard to their ability to predict reproductive success at pair level and also recover true underlying genetic values for male and female fertility. Prediction accuracy increased with the underlying heritable component as expected, while sex-specific tower-scores successfully recovered latent fertility liabilities despite models being trained only on observed joint outcomes. TT-LASSO achieved the highest overall classification performance, whereas TT-MLP provided more balanced and consistent recovery of sex-specific genetic values across scenarios. An additional simulation, incorporating genotype-dependent mate compatibility demonstrated advantages of fully-connected neural networks for capturing non-additive interactions. These results indicate that two-tower frameworks provide a powerful approach for modeling reproductive traits, enabling simultaneous prediction of aggregate reproductive outcomes and sex-specific fertility liabilities from genotypic information.

6

Simulating population pangenomes under coalescent demographic models with MSpangenome

Piat, L.; Denni, S.; Dubois, S.; Linard, B.; Duvaux, L.

2026-07-03 bioinformatics 10.64898/2026.06.29.735168 medRxiv

Top 0.2%

9.4%

Show abstract

Motivation: Pangenome variation graphs (PVGs) are increasingly used to represent genomic diversity, yet there is currently no general framework for generating population pangenomes directly from explicit evolutionary histories. Existing simulators typically focus on individual classes of variation and do not integrate these variations within a genealogy-aware framework driven by explicit demographic histories. As a result, evaluating pangenome methods in realistic population-genetic settings remains challenging, and benchmark datasets with known evolutionary ground truth are scarce. Results: We present MSpangenome, a genealogy-aware frame- work that bridges coalescent population genetic simulations and pangenome graph analyses. The pipeline combines ancestry simulation with msprime and a de novo graph construction algorithm to generate PVGs directly from simulated genealogies. By explicitly modeling recombination, demographic history and incomplete lineage sorting, MSpangenome produces structurally complex pangenomes in which nested and overlapping structural variants emerge naturally from the underlying genealogies, while their evolutionary history and graph topology remain known by construction. This provides a general framework for generating realistic population pangenomes and establishing ground-truth datasets for methodological evaluation. We demonstrate its utility by generating population-scale pangenomes and using them as controlled references to benchmark the widely used graph construction tools, PGGB and Minigraph-Cactus. Our analyses reveal contrasting performance regimes across levels of sequence diversity, sample sizes and classes of structural variation, highlighting the value of simulation-based benchmarking for identifying reconstruction errors that are hard to detect using empirical datasets alone. Availability and implementation: MSpangenome is imple- mented in Python, fully containerized, freely available at https://forge.inrae.fr/pangepop/MSpangepop and mirrored at https://github.com/inrae/MSpangepop.

7

How Robust are Multispecies Coalescent Species Delimitations in Taxonomically Complex Systems? A Genomic Assessment Using Mediterranean Tethya Sponges

van der Sprong, J.; Cardone, F.; Hoehna, S.; Schaetzle, S.; Deister, F.; Erpenbeck, D.; Woerheide, G.; Vargas, S.

2026-07-05 evolutionary biology 10.64898/2026.07.04.735074 medRxiv

Top 0.2%

8.7%

Show abstract

Reliable species delimitation underpins biodiversity assessment but remains difficult for organisms with plastic morphology and few diagnostic characters. Multispecies coalescent (MSC) methods can delimit species from genomic data, yet they are rarely tested in taxonomically complex, marine invertebrate groups where they are arguably most needed. We used the three Mediterranean species of the genus Tethya, a rare, well-characterised system within the otherwise taxonomically difficult phylum Porifera-distinguished by multiple independent morphological and ecological characters-to evaluate how robust MSC-based delimitation is in such groups. Analysing 64 single-copy nuclear loci in BEAST2 and BPP, we compared constrained, hypothesis-testing approaches (BFD*, BFdriver, A10) with freer, heuristic ones (SPEEDEMON, A11), and examined their sensitivity to data type, clock model, priors, and the species-collapse threshold. All methods recovered the three recognised Mediterranean species, but the resolution of within-lineage structure was method-dependent. The hypothesis-testing approaches consistently supported six lineages, robustly across data types and model assumptions, whereas the heuristic approaches proved less stable. Configurations without a priori species hypotheses often failed to converge or were computationally intractable, a problem compounded by the relaxed clock. In SPEEDEMON the outcome changed with the collapse threshold. Because our system lacks an independent reference point to calibrate this threshold, any delimitation based on it is poorly constrained. We conclude that constrained, hypothesis-testing delimitation is the most robust and reproducible MSC approach, yielding a quantitative, model-based hypothesis that can be weighed against other lines of evidence to inform taxonomic decisions. By clarifying how these methods behave and how their outcomes should be interpreted, our study offers a practical guide for researchers working on comparably complex systems.

8

Metabarcoding replicate detection frequency tracks ddPCR copy number for cod and herring eDNA in ancient marine sediments

Banos Lara, E.; Holman, L. E.; Knudsen, S. W.; Bohmann, K.

2026-07-08 genetics 10.64898/2026.07.03.736335 medRxiv

Top 0.2%

8.1%

Show abstract

1. Detecting environmental DNA (eDNA) from rare or low-abundance aquatic species remains a major challenge, particularly when it is highly degraded, present at low concentrations, and dominated by DNA from non-target taxa. These challenges are further amplified in sedimentary ancient DNA (sedaDNA) studies, where thousands of years can degrade eDNA further, making the detection and quantitative interpretation of weak biological signals difficult. 2. Metabarcoding is commonly used to produce high-throughput community-level data from eDNA but is inherently compositional and influenced by amplification biases. Nonetheless, metabarcoding read abundance or PCR replicate detection frequency are increasingly used as proxies for relative DNA concentration, but their quantitative interpretation has rarely been evaluated against independent measures of absolute DNA abundance. 3. We used droplet digital PCR (ddPCR) to quantify mitochondrial DNA from Atlantic cod (Gadus morhua) and Atlantic herring (Clupea harengus) in 136 ancient eDNA extracts from Icelandic marine sediment cores spanning the last three millennia. We compared ddPCR copy number estimates with metabarcoding (18S) derived relative abundance and detection frequency, and evaluated whether temporal DNA trends corresponded with proxy reconstructed sea surface temperature (SST) variability. 4. We found that ddPCR-measured fish sedaDNA abundance was positively correlated with the proportion of metabarcoding PCR replicates for both Atlantic cod and Atlantic herring. Moreover, temporal trends in Atlantic herring DNA abundance were consistent with proxy reconstructed SST variability, supporting the ecological relevance of the molecular signal. 5. Overall, our results show that ddPCR-derived DNA concentrations and metabarcoding PCR replicate detection frequency capture consistent patterns in low-abundance fish sedaDNA from marine sediments. The observed agreement between approaches supports the use of PCR replicate detection frequency as a semi-quantitative proxy for low-abundance sedaDNA.

9

A Draft Male Genome Assembly of the Slipper Lobster (Thenus australiensis) Reveals an XY System and a Validated Diagnostic Marker for Monosex Aquaculture.

Tran Nguyen, A. H.; Ha, G.-H.; Tran, D.-P.; Le, N. T.; Glendining, S.; Fitzgibbon, Q.; Herzig, V.; Luu, P.-L.; Ventura, T.

2026-06-29 genomics 10.64898/2026.06.24.734161 medRxiv

Top 0.3%

7.7%

Show abstract

The slipper lobster (Thenus australiensis) is rapidly emerging as a high-potential species for commercial aquaculture. Because females exhibit superior growth characteristics due to less frequent moulting after sexual maturity, developing monosex breeding strategies is highly desirable for industry profitability. However, the lack of genomic resources and early sex-identification tools has hindered this development. Here, we report the first draft male genome assembly for T. australiensis, generated using a combination of whole-genome shotgun sequencing, DArT-seq, and multi-tissue transcriptomics. The curated assembly spans 0.913 Gbp with high functional completeness (93.0% BUSCO), providing a robust repertoire of 30,100 protein-coding genes. Through k-mer subtraction and population-level DArT-seq genotyping, we provide definitive evidence that T. australiensis utilizes an XX/XY sex-determination system. Crucially, by identifying male-specific structural variations within a neo-Y locus, we developed a diagnostic PCR assay targeting a male-exclusive sequence. This 171 bp marker achieved 100% accuracy in phenotypic sex identification across wild-caught populations. Ultimately, these foundational genomic resources, combined with a highly reliable molecular sexing tool, provide the critical framework necessary for early sex sorting, broodstock management, and the commercial advancement of monosex slipper lobster farming.

10

Insect COI barcoding data as an untapped resource for surveying Wolbachia symbioses

Nowak, K. H.; Buczek, M.; Marszałek, M.; Prus-Frankowska, M.; Valdivia, C.; Deng, J.; Shropshire, J. D.; Łukasik, P.

2026-06-25 ecology 10.64898/2026.06.24.734267 medRxiv

Top 0.3%

7.7%

Show abstract

O_LIDNA barcoding of the mitochondrial cytochrome c oxidase I (COI) gene is widely used to characterise insect diversity and distributions; however, its potential to reveal information on species interactions, including host-symbiont associations, remains largely unexplored. Here, we assess whether COI amplicon data can be used to identify Wolbachia - one of the most widely distributed bacterial symbionts known to profoundly affect their hosts biology. C_LIO_LIWe demonstrate that several commonly used invertebrate COI primer sets perfectly match many reference Wolbachia genomes, leading to frequent co-amplification. C_LIO_LIBy screening 7,901 individual-insect COI amplicon libraries obtained with the popular BF3-BR2 primer set, we detected Wolbachia sequences in over 35% of samples, revealing that co-amplification is indeed widespread. After removing low-abundance reads, Wolbachia detection based on COI amplicons showed over 90% agreement with simultaneously generated 16S-V4 rRNA amplicon data from the same specimens. The degree of agreement, however, varied depending on the thresholds used, among datasets and insect clades. C_LIO_LIFurther, we show that Wolbachia abundance inferred from COI amplicons correlated with their abundance in metagenomic datasets for 152 specimens, supporting the quantitative relevance of the signal. C_LIO_LIFinally, we find that Wolbachia COI sequences provide greater phylogenetic resolution than 16S-V4 rRNA data (mean pairwise genetic distance of COI sequences - 9.6%, 16S-V4 rRNA - 2.8%), and the reconstructed Wolbachia COI-based genotype network largely agrees with genome-based phylogenies. C_LIO_LICollectively, our results demonstrate that off-target Wolbachia sequences recovered from standard insect COI barcoding data may reliably detect symbiont presence, provide phylogenetic insight, and guide sample selection for metagenomics. Given the rapid expansion of global insect barcoding initiatives, these findings highlight an opportunity for cost-effective monitoring of their most prevalent bacterial symbionts, offering new perspectives on how host-microbe interactions may shape insect communities. C_LI

11

Griphus Software for Multi Panel Figure Composition and Experimentation with Emphasis on Taxonomy

Aguiar, A. P.

2026-07-11 zoology 10.64898/2026.07.07.736512 medRxiv

Top 0.3%

6.8%

Show abstract

The preparation of multi panel figures remains a labor intensive step in scientific publication. Albeit there are specific tools available to solve this problem, they are often highly specialized, difficult to install, or time consuming to learn. Griphus is a standalone graphical application designed for rapid composition and experimentation with multi panel figures, developed by and for zoological taxonomists. Functions specifically designed for multi panel composition include automatic figure numbering and placement, aspect ratio operations, spacers, layout rotation, layout suggestions, and automatic generation of figure legends, including scale bar descriptions. The software can perform both spatial interpretation of images on the canvas and work with a simple, editable layout formula. It also enables instant multi panel composition, with numbered images and automatic contrast selection for the numbers, obtained simply by loading images. User defined parameters such as target printable dimensions, resolution, spacing, and color mode are preserved throughout the work. The program produces coordinated outputs consisting of the final composite figure, a readable file describing the layout structure, and a .gri file storing images, transformations, and parameters for exact regeneration. Griphus is intended as a complementary tool to professional image software, providing a simple and efficient environment for constructing high quality multi panel figures.

12

Epigenetic signatures of infection within and across generations in the endangered Loggerhead sea turtle

Bazely, J. O.; Yen, E. C.; Balard, A.; Gilbert, J. D.; Fairweather, K.; Lopes, A.; Taxonera, A.; Rossiter, S. J.; Eizaguirre, C.

2026-06-30 genetics 10.64898/2026.06.25.734236 medRxiv

Top 0.4%

5.2%

Show abstract

Infection can substantially reduce host fitness and influence population dynamics, yet it is often difficult to detect and quantify in wild animal populations. Molecular tools offer a valuable means of identifying cryptic infection in natural systems. Using whole-genome bisulfite sequencing, we examined whether infection with the parasitic leech Ozobranchus margoi is associated with DNA methylation variation in loggerhead sea turtles (Caretta caretta), while also assessing the potential value of this variation as a biomarker of parasite infection. In nesting females, we identified infection-associated differentially methylated CpG sites associated with genes implicated in immune signalling and cellular regulation. Offspring of infected females also showed infection-associated methylation patterns, despite not being directly exposed to the parasite themselves. Differential methylation analyses identified genes involved in immunity, neurodevelopment and metabolic activity, with limited overlap in associated genes and no overlap in differentially methylated sites between generations. Maternal and offspring genome-wide methylation levels showed a non-linear association that differed subtly with maternal infection status, indicating that infection modifies intergenerational methylation associations. Finally, methylation profiles showed strong discriminatory power for maternal infection status in both maternal and hatchling samples using machine learning models, supporting their potential as candidate biomarkers of cryptic infection. Together, these results show that parasite infection is associated with distinct, generation-specific DNA methylation signatures, and highlight the potential value of epigenetic data for monitoring cryptic infection states in conservation-relevant systems.

13

Field-derived temperature correction compromises eDNA-based abundance inference

Ogonowski, M.; Gerdes, Z.

2026-07-03 ecology 10.64898/2026.07.03.735744 medRxiv

Top 0.5%

4.7%

Show abstract

Environmental DNA (eDNA) has emerged as a promising tool for estimating fish abundance, yet linking eDNA concentration to true density remains a significant challenge in seasonal systems, where the signal is strongly influenced by temperature. We investigated whether eDNA can serve as an abundance index for three-spined stickleback (Gasterosteus aculeatus) in four coastal bays of the Baltic Sea (5.7-20.5{degrees}C, April-July 2023), by pairing eDNA sampling with two trap types of contrasting catchability. Light traps capture fish by phototactic attraction during darkness, so their catchability is driven primarily by night duration rather than temperature, while benthic traps respond to temperature through the same activity-driven mechanism as eDNA production. The temperature sensitivity of eDNA estimated from field data was far higher than physiological expectation (Q10 = 12.4, against a maximum metabolic rate benchmark of Q10 = 3.5), indicating that the field temperature signal reflects ecological change in addition to metabolism. We then compared how well three eDNA predictors tracked a combined trap-based abundance index: uncorrected eDNA, eDNA corrected with the temperature response constrained to the laboratory metabolic rate (a first-principles correction), and eDNA corrected with the response estimated from the field data. Uncorrected and first-principles-corrected eDNA were both strong predictors of abundance (standardised slopes of 0.45 and 0.43), whereas the field-corrected predictor was not (0.08). Uncorrected and first-principles-corrected eDNA performed comparably because temperature and abundance increased together over the season; the first-principles correction is nonetheless preferable, as it remains reliable when this covariation is unknown a priori. We conclude that estimating a temperature correction from field data should be avoided in seasonal eDNA monitoring, because it removes the abundance signal together with the temperature effect and assumes a stability in abundance that cannot be verified without independent reference data.

14

Full-length COI barcodes improve eDNA metabarcoding data denoising relative to mini-barcodes

Eisele, M. H.; Varusk, S.; Sammet, K.; Hakimzadeh, A.; Metsoja, M.; Tedersoo, L.; Alwutayd, K. M.; Arribas, P.; Andujar, C.; Emerson, B. C.; Anslan, S.

2026-07-03 ecology 10.64898/2026.07.03.736260 medRxiv

Top 0.5%

4.7%

Show abstract

Animal COI (mitochondrial cytochrome oxidase I) metabarcoding of environmental DNA (eDNA) is increasingly used to assess biodiversity in complex substrates such as soil. However, due to read-length constraints of second-generation sequencing platforms, mini-barcodes have been used instead of the full barcode region. Long-read sequencing technologies now enable the recovery of full-length barcode sequences, and are more commonly applied for studying microbes, but their use for metabarcoding the full-length standard COI barcoding region in animals remains limited. In this study, we compared three COI amplicon sets -- 313 bp, 660 bp, and 1,256 bp -- amplified from soil eDNA samples and sequenced using Illumina and PacBio platforms to evaluate their overall concurrence, the effectiveness of identifying nuclear mitochondrial DNA segments (NUMTs) and chimeras, as well as their respective taxonomic resolution. The long-read datasets exhibited a higher identification rate of NUMTs and true chimeras, suggesting that longer sequences improve the detection of noise in COI metabarcoding data, thereby reducing the occurrence of spurious taxa. Taxonomy assignment confidence was similar between the 313 bp and 660 bp datasets, whereas extending the amplicon beyond the standard COI barcode region (1,256 bp) reduced confidence, likely because longer reads extend into regions poorly represented in barcode reference databases. Despite substantially lower sequencing depth in the 660 bp dataset, per-sample OTU richness did not differ significantly from that recovered with the Illumina 313 bp amplicon set. Similarly, the relationships between samples were strongly correlated across the detected OTU communities, indicating consistent ecological interpretations between short and long amplicons. We conclude that the standard ~658 bp COI barcode is an optimal marker for soil animal metabarcoding from eDNA, balancing target recovery, artifact detection, taxonomic assignment and ecological interpretability. As COI eDNA metabarcoding becomes increasingly used in biodiversity assessment and is increasingly adopted in large-scale monitoring initiatives, this study provides methodological guidance for improving the robustness of soil animal community biomonitoring.

15

First genetic detection and ongoing eDNA monitoring of the golden mussel (Limnoperna fortunei) in California

Stinson, S. A.; Fiske, A.; Funk, E. C.; Kulig, E.; Brown, S.; Gille, D.; Schreier, A.; Sanders, L.; Nagarajan, R. P.; Barney, B.; Baerwald, M.

2026-06-23 genetics 10.64898/2026.06.18.733028 medRxiv

Top 0.5%

4.2%

Show abstract

Here, we report the first genetic confirmation of golden mussels (Limnoperna fortunei) in North America, and the subsequent development, optimization, and deployment of golden mussel eDNA monitoring procedures. Aquatic species invasions are economically costly, disrupt ecosystem functionality, and impact native aquatic communities. Early detection of new invasive species enables rapid response via implementation of effective eradication or control measures and is key for reducing harmful outcomes. Initial species detection and taxonomic identification can be aided by genetic methods that have high detection sensitivity and accuracy. Genetic methods such as environmental DNA (eDNA) sampling can be used to detect invasive species before they become established in new systems, providing an early alert system to inform resource managers. Golden mussels were first detected in North America in October 2024 near the Port of Stockton in the San Francisco Estuary (SFE). The SFE is particularly vulnerable to invasion due to the access and connectivity provided by the presence of engineering infrastructure and shipping lanes. Collaborative efforts between public agencies and academic institutions are underway to develop a coordinated detection and response plan. Early detection followed by a rapid response is the best defense against prolific invasive species, such as the golden mussel.

16

DATRASextra: An R package for streamlined workflows with ICES DATRAS bottom-trawl survey data

Mildenberger, T. K.; Maioli, F.; Berg, C. W.

2026-06-30 ecology 10.64898/2026.06.29.735240 medRxiv

Top 0.5%

4.0%

Show abstract

Scientific bottom-trawl surveys provide essential fisheries-independent data for fisheries and ecosystem research. In the Northeast Atlantic, the ICES Database of Trawl Surveys (DATRAS) compiles haul-level information, species- and length-specific catch data, and individual biological observations across multiple long-term surveys. However, reproducible workflows for processing and integrating these relational datasets remain challenging. We present DATRASextra, an open-source R package that provides modular end-to-end workflows for accessing, cleaning, harmonising, quality-controlling, and analysing DATRAS survey data. The package supports derivation of standardised haul-level survey variables, integration of multiple surveys, and generation of analysis-ready datasets for downstream applications including stock assessment, biodiversity analyses, and large-scale synthesis efforts such as FishGlob.

17

Nemo2.4: fast and accurate quantitative genetics forward-time simulations

Guillaume, F.; Cotto, O.; Chebib, J.; Beeravolu Reddy, C.; Schmid, M.

2026-07-08 evolutionary biology 10.64898/2026.07.02.736177 medRxiv

Top 0.6%

3.9%

Show abstract

We present Nemo 2.4, an advanced forward-time individual-based simulation framework designed to model the complex eco-evolutionary dynamics and genetic basis of quantitative traits. This tool addresses current challenges in evolutionary quantitative genetics by providing unprecedented flexibility and computational efficiency. Nemo 2.4's modular architecture allows researchers to design custom life cycles by combining specialized Life Cycle Event (LCE) modules, from reproduction and dispersal to selection, crossing, and phenotype expression. The software supports diverse population models, including both Wright-Fisher (WF) and non-WF dynamics, spatially explicit models, and varying demography. Nemo 2.4 handles a wide range of genetic architectures, including both multi-allelic Quantitative Trait Loci (QTL) for general trait studies, and dense di-allelic Quantitative Trait Nucleotides (QTN) implemented with highly optimized bit-wise data structures. Crucially, it allows the simulation of QTNs on comprehensive genetic maps that incorporate other genetic elements, providing genomic-scale resolution. Key biological complexities are integrated natively: the model accommodates modular pleiotropy, dominance, and pairwise epistasis across multiple traits, facilitating the study of complex genotype-phenotype mappings. Furthermore, Nemo 2.4 models phenotypic plasticity through reaction norms and incorporates underlying liability thresholds, enabling the simulation of environmental influences on trait evolution with various forms of selection (e.g., Gaussian, linear, truncation). Due to its compiled design and memory-efficient data representations for large numbers of loci, Nemo provides a robust platform for running high-throughput simulations critical for testing theoretical predictions in polygenic adaptation and understanding evolutionary responses to changing environments.

18

Confounding effects of inferring gene co-expression networks from pooled data from different biological populations

Runghen, R.; Eliassi-Rad, T.; Bolnick, D. I.

2026-06-29 bioinformatics 10.64898/2026.06.23.734063 medRxiv

Top 0.6%

3.5%

Show abstract

Weighted Gene Co-expression Network Analysis (WGCNA) is routinely applied to pooled datasets from multiple biological populations, genotypes, or treatment groups, implicitly assuming a shared module structure across groups. While the distortion of pairwise correlations by pooling heterogeneous groups is well established statistically, three aspects of this problem have received little systematic attention in the context of co-expression network analysis: the extent to which pooling disrupts the discrete module-level community structure inferred by WGCNA; whether this disruption is detectable from the global topology metrics researchers routinely report; and how prevalent the pooling practice is in published multi-group WGCNA studies. Using analytical toy examples and a four-scenario simulation framework, we address all three questions. Module preservation Zsummary scores declined progressively with between-population divergence, from full preservation under identical populations (mean median Zsummary = 25.2 {+/-} 3.3, 95% interval 19.0--30.7 across 20 simulation replicates) to substantial disruption when both network structure and mean expression differed (mean median Zsummary = 11.9 {+/-} 1.0, 95% interval 10.2--13.5). This disruption was undetectable from global topology metrics: modularity and clustering coefficient remained stable across all scenarios, while edge density was sensitive but non-specific. These findings were corroborated in an empirical reanalysis of divergent lake and stream stickleback transcriptomes, where merged analysis collapsed 26 lake-specific and 59 stream-specific modules into only 19 merged modules. A survey of 100 publications found that 78.7% (95% CI 69.4--87.9%) of multi-group WGCNA studies with sufficient methodological reporting used a single merged analysis. Results were robust across network sizes of 250--1,000 genes and rewiring rates of 10--50%. We provide concrete recommendations including module preservation testing in both directions, population-specific baseline networks, and consensus WGCNA as a principled alternative.

19

InsectDCT: A generalized pipeline for detection, taxonomic classification, and tracking of insects in camera-trap recordings

Bjerge, K.; Wogram, S. F. A.; Serra-Marin, P. E.; Sakhiashvili, O.; Hoye, T. T.

2026-07-10 ecology 10.64898/2026.07.07.736939 medRxiv

Top 0.6%

3.3%

Show abstract

Automated monitoring of insect pollinators in natural environments with insect camera traps and trained deep learning algorithms provides novel data for insect ecological studies. However, efficient and accurate image recognition analysis of the recorded images or videos is challenging, particularly for images containing small insects against complex backgrounds with diverse vegetation communities. Even when insects can be detected in images, identifying their taxonomy remains difficult, particularly in footage with low image resolution, light conditions, and distances from the plants, and in cases where insects appear blurry or only partially visible. In this work, we present InsectDCT, an AI-based pipeline for automated detection, hierarchical classification, and tracking of insects in footage of natural vegetation tested in different environments. The InsectDCT pipeline consists of three levels: insect Detection and localization, hierarchical taxonomic Classification, and spatio-temporal Tracking. In the first stage, insects are detected in time-lapse images or video recordings using the You Only Look Once (YOLO11) object detection architecture. Detection performance is improved using motion-enhanced images, which improve robustness in cluttered and 3 dimensional environments. The detector is trained on an extensive dataset that contains more than 60,000 images collected using camera traps deployed across a wide range of plant families and floral habitats. In the second stage, detected insects are classified using a hierarchical taxonomy-aware classification framework that covers 80 taxonomic groups. Classification is performed at multiple taxonomic levels, including order, family, and genus/species, allowing coarse and fine-grained ecological analyzes while accounting for varying levels of visual ambiguity. In the third stage, a multi-object tracking module is applied to high temporal-resolution image sequences and video data to associate detections of the same individual across time. InsectDCT code and all datasets are made publicly available. Author summaryInsects are declining worldwide, creating an urgent need for efficient methods to monitor their abundance, activity, and diversity. Traditional insect surveys often require extensive fieldwork and expert taxonomic identification, which limits the scale and frequency of monitoring. In this study, we developed InsectDCT, an artificial intelligence-based pipeline that automatically detects, classifies, and tracks insects in camera-trap recordings collected from natural and semi-natural environments. Our approach combines deep-learning methods for object detection, hierarchical taxonomic classification, and tracking of individual insect observations through time. Unlike many existing systems that are trained for a single habitat or plant species, we designed our framework using images collected across a wide range of flowering plants, camera systems, and insect groups. This makes the system more transferable to new ecological settings. The classifier can identify insects at multiple taxonomic levels and can return higher-level classifications when species-level identification is uncertain. We demonstrate that the pipeline can process large image datasets efficiently, including on low-power edge-computing devices such as Raspberry Pi systems. By providing both the software and the underlying datasets, we aim to support scalable, non-invasive insect monitoring and facilitate future ecological and conservation research.

20

Cas12a-Targeted Multiplexed Nanopore Sequencing

Rueegg, A. B.; Gehrold, R.; Agathos, K.; Chun, S.; Baur, A.; Pelczar, P.

2026-07-07 molecular biology 10.64898/2026.07.06.736710 medRxiv

Top 0.6%

3.3%

Show abstract

Targeted long read sequencing (LRS) of native genomic DNA (gDNA) using Oxford Nanopore Technologies (ONT) is an economically and computationally accessible method for sequencing selected genomic regions without the limitations associated with amplification-based approaches. At present, efficiency, multiplexing, and scalability remain key challenges for existing targeted LRS. We have developed Cas12a-Targeted Multiplexed Nanopore Sequencing (CTM-nSeq), which combines Cas12a-targeting, DNA fragment enrichment, and optimized adapter ligation using T7 DNA ligase. Unlike previously established protocols, CTM-nSeq is compatible with the latest ONT flow cell chemistry. Performing CTM-nSeq on a single sample with an R10.4 MinION flow cell routinely yields hundreds of on-target reads. Furthermore, CTM-nSeq enables targeting of multiple loci and is the first targeted ONT sequencing method, allowing reliable, barcode-assisted multiplexing. CTM-nSeq is an efficient and accessible method for sequencing native gDNA and analysing DNA methylation, repeat expansions, and sequence integrity. As such, CTM-nSeq has a wide range of analytical and diagnostic applications.